Skip to content

feat: Client-tool execution + permission system#3370

Draft
BinaryMuse wants to merge 38 commits intomainfrom
mkt/tool-perms
Draft

feat: Client-tool execution + permission system#3370
BinaryMuse wants to merge 38 commits intomainfrom
mkt/tool-perms

Conversation

@BinaryMuse
Copy link
Copy Markdown
Member

@BinaryMuse BinaryMuse commented Mar 31, 2026

Summary

Adds client-side tool execution to Atuin AI, starting with atuin_history. The server can request tool calls, which are executed locally with a permission system, and results are sent back to continue the conversation.

Architecture

  • Modular extraction: Decomposed the monolithic inline.rs into stream.rs (SSE framing), dispatch.rs (event handling), state.rs (domain state), tools/ (tool types + execution), and permissions/ (rule resolution)
  • ToolTracker: Single source of truth for tool execution lifecycle — replaces the previous split between pending_tool_calls and ConversationEvent preview fields. Tracks each tool from permission check through completion, including cached previews for shell command output
  • Permission system: File-based permission rules with tree-sitter shell command parsing for scoped matching (e.g. git * allows all git subcommands). Supports auto-allow, deny, and interactive ask with an inline selection UI
  • Shell command execution: VT100 terminal emulator for live output preview in a Viewport widget, with interrupt support (Ctrl+C)

Ships with

  • atuin_history tool active by default — lets the AI search your command history
  • Other tools (read_file, execute_shell_command, str_replace, file_create, file_insert) are defined but gated behind capability flags
  • Permission denied flows surface errors inline on the tool call rather than as separate system messages
  • Markdown list rendering fix (items no longer break across lines)
  • Integrates main's new Hub auth flow and ai.opening settings (send_cwd, send_last_command)

…oundary

This change extracts four new abstractions and sets a clear architectural
rule that will guide subsequent refactoring:

**Mutation boundary (Phase 1a):** AppState no longer owns an
mpsc::Sender. State is passive data; the event loop (or stream task) is
the only thing that emits events. handle_client_tool_call is now a pure
mutation — the CheckToolCallPermission event is sent by the caller.

**AppContext (Phase 1b):** Session-scoped API configuration (endpoint,
token, send_cwd) is now a single Clone struct instead of three
individually-cloned bindings in the event loop.

**ClientContext + ChatRequest (Phase 1c):** Machine identity (OS, shell,
distro) is computed once per session via ClientContext::detect() instead
of on every SSE request. ChatRequest wraps the per-turn message/session
payload, replacing the inline request body construction in
create_chat_stream.

**ToolDescriptor (Phase 1d):** Centralizes tool metadata — canonical
names, display verbs, progressive/past verbs, and client/server
classification — into static descriptors. Replaces four separate
name-to-text match sites and fixes a bug where is_client was checked
against 'file_read' (wrong) instead of 'read_file' (correct).

Also fixes several clippy warnings in modified code.
…rames

**State decomposition (Phase 2a):** AppState is replaced by three types
with clear ownership:

- Conversation: owns the event log and session_id. All pure query
  and event-manipulation methods (events_to_messages, current_command,
  has_any_command, etc.) move here.
- Interaction: owns ephemeral UI state (mode, is_input_blank,
  confirmation_pending, streaming_status, error).
- Session: the top-level type containing conversation, interaction,
  pending_tool_calls, exit_action, and stream_abort. Cross-cutting
  lifecycle methods (start_streaming, cancel_streaming, add_tool_call,
  etc.) stay here.

**Dispatch extraction (Phase 2b):** The 400-line match event in the
spawn_blocking loop is now a 5-line dispatch call. All 12 handlers are
named functions in a new tui/dispatch.rs module. inline.rs shrinks
from ~640 lines to ~240 lines.

**Stream launch centralization (Phase 2c):** The duplicated 8-line
stream launch ritual (present in ContinueAfterTools, SubmitInput, and
Retry) is replaced by a single launch_stream function that takes a
setup callback for pre-work. Each handler collapses to one line.

**Stream frame split (Phase 2d):** ChatStreamEvent is replaced by
StreamFrame with explicit Content (TextChunk, ToolCall, ToolResult)
and Control (Done, Error, StatusChanged) variants. run_chat_stream
now dispatches on frame type, with apply_content_frame and
apply_control_frame as separate functions.
Extracts three abstractions that untangle the ~180-line
on_check_tool_permission handler:

**PermissionResolver (permissions/resolver.rs):** Composes the
PermissionWalker + PermissionChecker into a single new/check API.
The handler no longer imports Walker, Checker, or Request directly.

**ToolOutcome + ClientToolCall::execute (tools/mod.rs):** Each tool
variant owns its execution logic. ReadToolCall::execute handles both
directory listing and file reading (previously only directory listing
worked — file reading was a no-op). Returns ToolOutcome::Success or
ToolOutcome::Error, replacing the inline match arms.

**PendingToolCall state transitions (tools/mod.rs):** mark_asking(),
mark_executing(), mark_denied() methods formalize the state machine
instead of direct enum variant assignment.

**Session::complete_tool_call (tui/state.rs):** Combines
add_tool_result + pending_tool_calls.retain into one method,
replacing the repeated cleanup pattern in the handler.

The handler drops from ~180 lines to ~60 lines.
Add the history database to AppContext and plumb it through dispatch
so client tools can run async database queries. AtuinHistory::execute
uses atuin-client's Database::search with fuzzy matching, the first
filter mode from the tool call, and configurable limit.

Also fix two pre-existing bugs that prevented client tools from
working end-to-end: AtuinHistory::matches_rule had a todo!() panic
that crashed the permission check task, and on_select_permission
Allow was discarding the tool call instead of executing it.
Format results with local timezone timestamp and human-readable
duration (e.g. 3s, 1m23s, 120ms) alongside command, cwd, and exit
code.
Parse shell commands with tree-sitter-bash (POSIX family) and
tree-sitter-fish to extract all subcommands from compound
expressions (&&, ||, pipes, subshells, $(...), etc). Wire into
ShellToolCall::matches_rule for scope matching.

Scope matching supports three wildcard styles:
- `ls *` (space before *): word-boundary match
- `ls*` (no space): prefix/glob match
- `git * amend` (middle *): matches zero+ words between segments

Also fixes: variable assignments excluded from command text,
fallback parser double-split bug, and shell field parsed from
tool call input for correct parser selection.
Add 77 adversarial tests covering nested substitutions, variable
assignments, control flow, redirections, subshells, background jobs,
real-world commands, fish-specific syntax, and scope matching edge
cases.

Fix: remove `concatenation` from BASH_LEAVES so that subcommands
inside argument concatenations (e.g. `make -j$(nproc)`) are properly
extracted.

Known limitations verified by tests:
- `find -exec` body is opaque to tree-sitter (not parsed as commands)
- `[` in test conditions is not extracted as a command
- `eval`/`exec`/`source` argument bodies are not recursively parsed
Shell tool calls now execute locally with a streaming VT100 preview in
the TUI. The full stdout and stderr are captured separately and sent to
the LLM as structured results with exit code and duration.

Key changes:
- ToolOutcome::Structured variant with separated stdout/stderr/exit code/duration
- execute_shell_command_streaming uses vt100 crate for ANSI/progress bar handling
- Shared execute_tool dispatch eliminates duplicated shell execution paths
- begin_tool_call/finish_tool_call for ToolCall persistence in chat output
- Ctrl+C interrupts running commands instead of exiting the app
- Viewport component in eye_declare for fixed-height tail rendering
Replace the split state model (pending_tool_calls VecDeque + preview
field on ConversationEvent::ToolCall) with a unified ToolTracker that
owns each tool call through its full lifecycle, including after
completion. This eliminates the preview copy-back dance, the two-place
preview lookup, and the TurnBuilder second-pass update_previews step.

Key changes:
- New ToolTracker/TrackedTool/ToolPhase types replace PendingToolCall/ToolCallState
- ConversationEvent::ToolCall drops its preview field (now purely API-facing)
- shell_abort_tx moves from Session to TrackedTool.abort_tx (per-tool, not per-session)
- TurnBuilder takes &ToolTracker reference, looks up previews inline
- Fix spinner not animating during shell preview (work around eye_declare
  interval reset by computing frame from system clock)
- Fix word wrapping in shell output preview (use truncation instead of
  word-boundary wrapping for VT100 content)
- Use multi-thread tokio runtime for AI commands
# Conflicts:
#	crates/atuin-ai/src/commands/inline.rs
@BinaryMuse BinaryMuse changed the title feat: Permission system for client tool calls feat: Client-tool execution + permission system Apr 9, 2026
@BinaryMuse
Copy link
Copy Markdown
Member Author

@greptileai Please review this draft

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 9, 2026

Greptile Summary

This PR decomposes the AI chat TUI into stream.rs, dispatch.rs, state.rs, and new tools/ + permissions/ modules, and adds client-side atuin_history tool execution with a file-based permission system and VT100 shell preview. The three previously-flagged issues (stub AlwaysAllow handlers, todo!() panic in ReadToolCall::matches_rule, and unchecked u64 subtraction) are all resolved in this revision.

  • The capabilities array sent to the server only contains client_v1_atuin_history, but ClientToolCall::try_from (called on every inbound tool_call event) accepts read_file, create_file, and execute_shell_command unconditionally \u2014 see comment on stream.rs:277.

Confidence Score: 4/5

Safe to merge with one security concern: unadvertised tools can be requested and queued for permission prompts by a non-compliant server.

All three previously-flagged P0/P1 issues are resolved. One new P1 remains: the capability gating on the receive path. The permission system's default-Ask behavior limits the blast radius, but the advertise-vs-enforce gap is a real correctness issue worth fixing before shipping to users who believe certain tools are disabled.

crates/atuin-ai/src/stream.rs — apply_content_frame capability check

Security Review

  • Capability bypass in stream.rs line 277: ClientToolCall::try_from accepts read_file, create_file, and execute_shell_command even when those capabilities were not advertised to the server. A compromised or buggy server can request shell/file tools and have them queued for permission prompts regardless of user-configured capability flags. The permission system (defaults to Ask) partially mitigates this but does not prevent the prompt from appearing.

Important Files Changed

Filename Overview
crates/atuin-ai/src/stream.rs SSE framing and content dispatch — accepted tool names include read_file, create_file, execute_shell_command regardless of advertised capabilities
crates/atuin-ai/src/tools/mod.rs Tool execution lifecycle, ToolTracker, shell streaming — previously-flagged todo!/subtraction issues are resolved; path_matches_scope handles Read/Write scopes correctly
crates/atuin-ai/src/tui/dispatch.rs Event dispatch — previously-flagged AlwaysAllow stubs are now implemented, writing rules and executing tools correctly
crates/atuin-ai/src/tui/state.rs Domain state and conversation event log — events_to_messages correctly handles mixed text+tool_call content blocks
crates/atuin-ai/src/permissions/check.rs Permission checker with ask→deny→allow priority ordering — default is Ask, which is the correct safe fallback
crates/atuin-ai/src/permissions/writer.rs Persists allow/deny rules to TOML files — correctly deduplicates and preserves formatting; acknowledged as not concurrent-safe but serialized by UI

Reviews (3): Last reviewed commit: "feedback" | Re-trigger Greptile

@socket-security
Copy link
Copy Markdown

socket-security bot commented Apr 10, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedcargo/​glob-match@​0.2.110010093100100
Addedcargo/​vt100@​0.16.210010093100100
Updatedcargo/​eye_declare@​0.3.0 ⏵ 0.4.010010093100100

View full report

@atuinsh atuinsh deleted a comment from greptile-apps bot Apr 10, 2026
Comment on lines +277 to +289
if let Ok(tool) = ClientToolCall::try_from((name.as_str(), &input)) {
// Client-side tool — add to tracker and conversation, queue permission check
let id_for_event = id.clone();
handle.update(move |state| {
state.handle_client_tool_call(id_for_event, tool, input);
});
let _ = tx.send(AiTuiEvent::CheckToolCallPermission(id));
} else {
// Server-side tool — just add to conversation events
handle.update(move |state| {
state.add_tool_call(id, name, input);
});
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 security Capability gating is advertise-only, not enforced on receive

ClientToolCall::try_from accepts read_file, create_file, and execute_shell_command regardless of what was sent in the capabilities array. Only client_v1_atuin_history is in the default capabilities, but a server (compromised or otherwise) can still request the other tools and the client will parse and queue them for permission-checking. The PR description says these tools are "gated behind capability flags" — that's only true from the server's perspective.

The permission system defaults to Ask, so execution still requires user confirmation. But it means a user who believes shell execution is disabled could be prompted to grant it unexpectedly.

Consider filtering in apply_content_frame against the advertised capabilities, or add a ClientToolCall::is_capability_enabled check before queuing the permission event.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants